doc: v3api rfc #2675

xiang90 · 2015-04-14T03:19:25Z

This is a real Request For Comments thing.

All feedbacks are welcomed! Any question are also welcomed!

We have been working really hard on the internal raft. As the new raft getting stable, we are starting to design the new API/store for etcd.

Here are the feature requests from the community that we plan to support:

Optimistic concurrency GoogleCloudPlatform/kubernetes#1957 (comment)
Access to past versions of objects (after history compaction) GoogleCloudPlatform/kubernetes#1957 (comment)
Efficient reliable and recursive watch GoogleCloudPlatform/kubernetes#1957 (comment)
Multi-object transactions #1282 #860 #652
The ability to somehow build multiple indices #860 #1548
Pagination support GoogleCloudPlatform/kubernetes#1957 (comment)
Low per-object memory footprint GoogleCloudPlatform/kubernetes#1957 (comment)
Object TTLs (for events, logically deleted objects) #1232
Allow clients to request watch heartbeats to refresh the clients window #2048
Binary key and value #1548
On-disk storage #732

For #2647

xiang90 · 2015-04-14T03:20:51Z

/cc @derekwaynecarr @tristanz @klizhentas @mauricioabreu @victorpoluceno @ytjohn @ehames @sachja @bgrant0607 @sckott @lavalamp @smarterclayton @technoweenie @hchoudh
@wojtek-t

chancez · 2015-04-14T03:26:03Z

Documentation/rfc/v3api.md

+    raft_term = 0x1,
+    kvs = {
+	    {
+		 key = foo,


odd spacing

smarterclayton · 2015-04-14T15:14:39Z

Documentation/rfc/v3api.md

+  optional uint64 raft_term = 5;
+}
+
+message RangeRequest {


Is the plan for the store to implement MVCC based on index or the version of the key? If so, would a range request be able to filter on a maximum index? I didn't know whether the underlying store would include MVCC, or whether that would be the level above the store (imposed as the last key segment by the etcd simple api).

Range query is done atomically. The underlay store is MVCC, so it is low cost.

So then adding an optional int64 max_index to this struct would not break the underlying model and also allow you to do range queries where you get the latest resource up to index X?

The use case I'm asking about is being able to do consistent paged range queries,

get initial range min_key=start limit=100 max_index=0,

read index from initial range Y and max key Z

get next range min_key=Z max_index=Y limit=100

repeat

@smarterclayton
Let us be more clear here.

foo = bar [index 1] foo = bar00 [index 2] range query all -> foo = bar00; range query (all, max_index=1) -> nothing foo1 = bar [index 3] range query all -> foo = bar00; foo1 = bar 1; range query (all, max_index=2) -> foo = bar

Is this what you want?

We are not going to keep all versions in the key value space.

Instead, You can get the old versions only by scanning the history.
You can range scan the history via watch.

@smarterclayton All the old versions of keys can be retrieved from the history store, which is sorted by logic time order.

On Apr 14, 2015, at 12:03 PM, Xiang Li notifications@github.com wrote:

In Documentation/rfc/v3api.md:

// the cluster id to check with

// if the cluster id is not matched, an error is returned.

optional uint64 cluster_id = 1;
+}

+message ResponseHeader {

// an error type message?

optional string error = 1;

optional uint64 cluster_id = 2;

optional uint64 member_id = 3;

// index of the store when the requested was processed.

optional int64 index = 4;

optional uint64 raft_term = 5;
+}

+message RangeRequest {
@smarterclayton

The current API does not directly support consistent range over multiple requests. You can build your own via range + history.

Do you have a real use case? What are you trying to solve?

For Kubernetes, there are advantages to being able to do ranged snapshot queries to process data in chunks (instead of loading 100k keys in memory at once, make 100 1K range requests). We can do that via buckets in the keyspace, but the larger appeal of range was the consistent chunk aspect.

That being said, it does not break our use cases if that's not available. I believe in order to emulate that we would need delete tombstones in the key space and would have to track the highest modified index in the scan. Do you anticipate/expect there to be delete tombstones in the keyspace?

—
Reply to this email directly or view it on GitHub.

Do you anticipate/expect there to be delete tombstones in the keyspace?

The tombstones is in the event history. Do a range query with max_index. Fill in the gaps via another range query over the history from the max_index to now.

Does that make sense to you?

We can do the consistent query too. But it would make the API a little bit complicated. For each range query, we will return you a range ID and a timeout. Within the timeout, you can get consistent snapshot.

On Apr 14, 2015, at 12:21 PM, Xiang Li notifications@github.com wrote:

In Documentation/rfc/v3api.md:

// the cluster id to check with

// if the cluster id is not matched, an error is returned.

optional uint64 cluster_id = 1;
+}

+message ResponseHeader {

// an error type message?

optional string error = 1;

optional uint64 cluster_id = 2;

optional uint64 member_id = 3;

// index of the store when the requested was processed.

optional int64 index = 4;

optional uint64 raft_term = 5;
+}

+message RangeRequest {
Do you anticipate/expect there to be delete tombstones in the keyspace?

The tombstones is in the event history. Do a range query with max_index. Fill in the gaps via another range query over the history from the max_index to now.

Does that make sense to you

Yes, I think so.
We can do the consistent query too. But it would make the API a little bit complicated. For each range query, we will return you a range ID and a timeout. Within the timeout, you can get consistent snapshot.

I don't want to overcomplicate the store or API, certainly. I think consistent paged range queries are valuable on their own to a wide range of users, as long as it does not break the core consistency guarantees of the store. But I may also be bringing my own assumptions to the discussion. If easy to emulate above, or can lead to complex problems on failover, it doesn't need to be in the API.
—
Reply to this email directly or view it on GitHub.

@smarterclayton Sure. We will keep that in mind. We will have another iteration on the API soon.

xiang90 · 2015-04-21T17:15:34Z

@barakmich Can you take a look? Want you know your opinion on this.

barakmich · 2015-04-21T19:34:35Z

Documentation/rfc/v3api.md

+    - easy for people to write simple etcd application
+
+
+## Protobuf Defined API


Split into a .proto file for further comment (and ultimately copy!)

barakmich · 2015-05-01T20:26:32Z

Documentation/rfc/v3api.proto

+  // LeaseTnx likes Tnx. It does one additional thing that all the keys put into the
+  // store are attached with the given lease. This is rpc is useful when you want to
+  // put a key and attach it to a lease atomically.
+  rpc LeaseTnx(LeaseTnxRequest) returns (LeaseTnxResponse) {}


Should this be separate or be part of the generic Tnx?

i wil explore more. i am not a big fan of this api either.

hmm... after think this for a while, i think we should still let the lease know about tnx (a key space operation) instead of letting the key operation knowing about the lease.

barakmich · 2015-05-01T20:27:25Z

Documentation/rfc/v3api.proto

+  rpc LeaseKeepAlive(stream LeaseKeepAliveRequest) returns (stream LeaseKeepAliveResponse) {}
+}
+
+message RequestHeader {


barakmich · 2015-05-12T22:23:49Z

Documentation/rfc/v3api.proto

+  optional ResponseHeader header = 1;
+}
+
+message DeleteRangeRequest {


Is this part of RequestUnion?

xiang90 · 2015-05-18T20:46:55Z

@barakmich I fixed up the proto. Now it compiles. I am going to squash and merge this.

doc: v3api rfc

schmichael · 2015-05-22T21:40:13Z

Looks like I just missed the merge window for this, but I do think #2857 (non-recursive Create/PUT behavior) would be a useful feature.

xiang90 · 2015-05-22T21:43:00Z

@schmichael In v3, there is no dir concept. So there is no recursive Create/PUT actually...

deepakthipeswamy · 2016-10-15T00:35:40Z

Can I store binary objects as values? If yes, is there a C binding which supports this API?

xiang90 added the v3api label Apr 14, 2015

xiang90 mentioned this pull request Apr 14, 2015

v3 api/store design #2647

Closed

chancez reviewed Apr 14, 2015
View reviewed changes

Documentation/rfc/v3api.md

raft_term = 0x1,

kvs = {

{

key = foo,

Copy link

Contributor

chancez Apr 14, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odd spacing

xiang90 force-pushed the v3rfc branch 4 times, most recently from 0e5783c to 44182a0 Compare April 14, 2015 03:44

smarterclayton reviewed Apr 14, 2015
View reviewed changes

xiang90 mentioned this pull request Apr 16, 2015

Why etcd only keeps the responses of the most recent 1000 events across all etcd keys? #2695

Closed

schmichael mentioned this pull request Apr 16, 2015

index / events API #2670

Closed

xiang90 force-pushed the v3rfc branch from 217825a to 053517c Compare April 21, 2015 19:22

barakmich reviewed Apr 21, 2015
View reviewed changes

xiang90 force-pushed the v3rfc branch 2 times, most recently from 59573f7 to 41017dc Compare April 21, 2015 21:30

wojtek-t mentioned this pull request Apr 23, 2015

Proposal for High Availability of Daemons kubernetes/kubernetes#6995

Merged

barakmich reviewed May 1, 2015
View reviewed changes

xiang90 mentioned this pull request May 5, 2015

Slow etcd kubernetes/kubernetes#7767

Closed

barakmich reviewed May 12, 2015
View reviewed changes

doc: v3api rfc

7c879ee

xiang90 force-pushed the v3rfc branch from a2b7479 to 7c879ee Compare May 18, 2015 20:48

xiang90 added a commit that referenced this pull request May 18, 2015

Merge pull request #2675 from xiang90/v3rfc

6ee5cd9

doc: v3api rfc

xiang90 merged commit 6ee5cd9 into etcd-io:master May 18, 2015

xiang90 deleted the v3rfc branch May 18, 2015 20:52

barakmich mentioned this pull request May 22, 2015

Create /path/key if-and-only-if /path exists but key does not #2857

Closed

abronan mentioned this pull request Aug 4, 2015

Etcd can't be used a datastore on multihost moby/libnetwork#392

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: v3api rfc #2675

doc: v3api rfc #2675

xiang90 commented Apr 14, 2015

xiang90 commented Apr 14, 2015

chancez Apr 14, 2015

smarterclayton Apr 14, 2015

xiang90 Apr 14, 2015

smarterclayton Apr 14, 2015

smarterclayton Apr 14, 2015

xiang90 Apr 14, 2015

xiang90 Apr 14, 2015

smarterclayton Apr 14, 2015

xiang90 Apr 14, 2015

smarterclayton Apr 14, 2015

xiang90 Apr 14, 2015

xiang90 commented Apr 21, 2015

barakmich Apr 21, 2015

barakmich May 1, 2015

xiang90 May 1, 2015

xiang90 May 11, 2015

barakmich May 1, 2015

barakmich May 12, 2015

xiang90 commented May 18, 2015

schmichael commented May 22, 2015

xiang90 commented May 22, 2015

deepakthipeswamy commented Oct 15, 2016

		- easy for people to write simple etcd application


		## Protobuf Defined API

doc: v3api rfc #2675

doc: v3api rfc #2675

Conversation

xiang90 commented Apr 14, 2015

xiang90 commented Apr 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiang90 commented Apr 21, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiang90 commented May 18, 2015

schmichael commented May 22, 2015

xiang90 commented May 22, 2015

deepakthipeswamy commented Oct 15, 2016